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Abstract 

The paper formalizes and extends the idea of local structure approximation for 

cellular automata originally proposed by Gutowitz et. al. |Tj. We start with a review 

<^ I of the construction of a probability measure on the set of bi-infinite strings over a finite 

Un ' alphabet of N symbols. We then demonstrate that for a shift-invariant probability 

^^ , measure, probabilities of all blocks of length up to k can be expressed by {N — 1)A^ 

QQ \ linearly independent block probabilities. Two choices of these independent blocks are 

discussed in detail, one in which we choose the longest possible blocks ("long form") and 

(^ . one in which we choose the shortest possible blocks ("short form"). We then proceed to 

f^ \ review the method which allows to approximate probabilities of blocks longer than k by 

blocks of length k or less. This approximation, known as Bayesian extension or Markov 

measure, is then used to construct approximate orbits of shift-invariant probability 

^ \ measures under the action of probabilistic or deterministic cellular automaton. We show 

H ' that the aforementioned approximate orbit is completely determined by an (iV— 1)A^ - 

" " " dimensional map. When the short form of block probabilities is used, this map takes 

particularly simple form, often revealing important features of a particular cellular 

automaton. 

1. Introduction 

Cellular automata (CA) are often considered as maps in the space of Borel shift-invariant 
probability measures equipped with the weak^^r topology |2l |3l HJ |5]. The central problem of 
the theory of cellular automata in this setting is to determine properties of orbits of given 
initial measures /i under the action of a given cellular automaton. Since computing the orbit 
of a measure is in general very difficult, approximate methods have been considered. The 
simplest of these methods is called the mean-field theory, and has its origins in statistical 



physics [6]. The main idea behind the mean-field theory is to approximate the consecutive 
iterations of the initial measure by Bernoulli measures. While this approximation is obviously 
very crude, it is sometimes quite useful in applications. 

In 1987, H. A. Gutowitz, J. D. Victor, and B. W. Knight [1] proposed a generalization of 
the mean-field theory for cellular automata which, unlike mean-field theory, takes (partially) 
into account correlations between sites. The basic idea of local structure theory is to consider 
probabilities of blocks of length k and to construct a map on these block probabilities, which, 
when iterated, approximates probabilities of occurrence of the same blocks in the actual 
orbit of a given cellular automaton. The construction was based on the idea of "Bayesian 
extension", introduced earlier by other authors in the context of lattice gases [71[8], and also 
known as a "finite-block measure" or as "Markov process with memory". 

In the original paper, Gutowitz et. al. made a compelling argument that "the local 
structure theory appears to be a powerful method for characterization and classification of 
cellular automata" [T]. After performing extensive Monte-Carlo simulations and statistical 
analysis they concluded that the local structure "is an accurate model of several aspects 
of cellular automaton evolution. The dependence on initial conditions and convergence 
properties are well modeled by the theory. It appears that, even for complex rules, the stable 
invariant measures of a cellular automaton may be estimated to arbitrary resolution" jl]. 

In the last 25 years, the local structure theory has been applied to study various aspects of 
dynamics of both deterministic and probabilistic cellular automata, including, for example, 
such topics as classification of CA, phase transitions in probabilistic CA, CA models of traffic 
fiow, asynchronous CA, and many others. In spite of this, there has been virtually no effort 
to study this theory from a more formal point of view, in order to obtain rigorous results 
which could be confronted with Monte Carlo experiments and numerical simulations. This 
paper is intended to be a fist step toward filling this gap. 

The paper is organized as follows. In the first section, we review the classic construction of 
measures on J^, where A = {0, 1, . . . , A^ — 1}, using cylinder sets and the Hahn-Kolmogorov 
extension theorem. In the next section we show that for a shift-invariant measure, measures 
of all cylinders sets of length up to k, which we call "block probabilities", can be generated 
by {N — 1)N'' linearly independent block probabilities. We describe two choices of these 
independent blocks probabilities, "long form" and "short form". In section 4 we show how 
the knowledge of measures of cylinder sets of length up to k can be used to approximate the 
entire measure. This construction is sometimes known as the "maximal entropy" extension. 
We present proof of the maximality of the entropy following the idea given in [9] and adopted 
to our formalism. 

The maximal entropy extension is then used to construct approximate orbit of a measure 
/i under the action of cellular automaton. Points of this orbit are entirely determined by 
(A^ — 1)A^'^ block probabilities, thus it is possible to generate approximate orbits by iterating 
(A^ — 1)A^ '^-dimensional real maps, instead of much more complicated A^'^ dimensional maps 
proposed in [1]. We also show that, as k increases, every point of the approximate orbit 
weakly converges to the corresponding point of the exact orbit. 

Finally, we present some examples of local structure maps and their reduced form. 



2. Construction of a probability measure 

Let A = {0, 1, . . . , A^ — 1} be called an alphabet, or a symbol set, and let X = A^. The 
Cantor metric on X is defined as c?(x, y) = 2~^, where k = niin{|z| : Xj ^ y^}. X with 
the metric d is a Cantor space, that is, compact, totally disconnected and perfect metric 
space. A finite sequence of elements of ^, b = 6162 • • • 7 ^n will be called a block (or word) 
of length n. Set of all blocks of elements of A of all possible lengths will be denoted by A*. 
Elementary cylinder set generated by the block b = 6162 ■ ■ ■ ,bn and anchored at i is defined 
as 

[b], = {x G ^^ : X[,,,+„) = b}, (1) 

where we require that one of the indices i,i + l,...,i + n~lis equal to zero, or, equivalently, 
that — n + 1 < i < 0. For a given elementary cylinder set [b]j, indices i,i + l, . . . ,i + n — l will 
be called fixed, while all other indices will be called free. The requirement —n + 1 < i < 0, 
therefore, means that the origin is always fixedj^ The collection (class) of all elementary 
cylinder sets of X together with the empty set and the whole space X will be denoted by 
Cyl{X). We will use the convention that for b = 0, [b],j = X. 

Let [a]j and [b]j be two elementary cylinder sets. We will say that p G Z is a matching 
(mismatching) index of these cylinder sets if for every x G [a]j, y G [b]j we have Xp = yp 
[xp 7^ yp). An index which is either matching or mismatching will be called overlapping 
index. Note that since we require that the origin is fixed, any two cylinder sets must have 
at least one overlapping index. 

Proposition 2.1 The collection of all elementary cylinder sets together with the empty set 
and the whole space constitutes a semialgebra over X . 

Proof: In order to show that elementary cylinder sets constitute a semialgebra we need to 
prove (i) the closure under the intersection and (ii) that the set difference of two elementary 
cylinder sets is a finite union of elementary cylinder sets. 

For (i), let [a]j and [b]j be two elementary cylinder sets. As such, they must have 
some overlapping indices. If all overlapping indices are matching, then [a]j fl [h]i is just 
the elementary cylinder set generated by overlapped concatenation of a and b. If among 
overlapping indices there is at least one mismatching index, then [a]j fl [b]j is empty. 

For (ii), let us observe that 

n-l 

X\[b],= U{xGX:x.+,^6,+i}. (2) 

i=o 

Each of the sets {x G X : Xj+j 7^ ^j+i} can be expressed as a union of elementary cylinder 
sets, thus X \ [h]i is also a union of elementary cylinder sets. Now, since 

[a],\[b]. = [a],n(X\[b],), (3) 

and fi) holds, we obtain the desired result. D 



^ Other choices of elementary cyhnder sets are possible, not requiring fixed origin - see, for example, [5]. 
Our choice is motivated by the desire that the set of elementary cylinder sets is closed under intersection. 



We will now introduce the notion of a measure on the semi-algebra of cylinder sets. Let 
P be a semialgebra. A map /i : P — ?> [0, oo] is called a measure on V if it is countably 
additive and fi{0) = 0. By countable additivity we mean that for any sequence {Aj}^]^ of 
pairwise disjoint sets belonging to V such that IJi^i ^i ^ ^) 



fx 



[jAA=J2f^iA). (4) 



For measures on the semialgebra of cylinder sets, countable additivity is implied by finite 
additivity. 

Proposition 2.2 Any finitely additive map /i : Cyl{X) — > [0, oo] for which yu(0) = is a 
measure on the semialgebra of elementary cylinder sets Cyl{X). 

Proof: We start with a remark that in the Cantor topology elementary cylinder sets are 
clopen, that is, both closed and open. 

Suppose now that the map /i satisfies fi{0) = and is finitely additive, that is, for any 
finite sequence {^ij^i of pairwise disjoint sets belonging to Cyl{X) such that IJI^i ^« ^ 
Cyl{X), 

(m \ m 

Ua =5^M^.)- (5) 

In order to show that yU is a measure on Cyl{X), we need to show that it is countably 
additive. Let i? be a cylinder set and let {Aj}^^ be a collection of pairwise disjoint cylinder 
sets such that U^^ Ai = B. Since B is closed, it is also compact. Sets Ai are open, and form 
a cover of the compact set B. There must exist, therefore, a finite subcover, that is, a finite 
number of sets Ai covering B. Moreover, since A^ are mutually disjoint, there must exist m 
such that Ai = for all i > m, and therefore B = IJi^i Ai. Then by finite additivity of (x 
and the assumption that fi{0) = we obtain 

(oo \ / rn \ m oo 

\JaA =f^i[JAA =^MA) = I]/i(^.), (6) 

which means that fi is countably additive and thus is a measure on Cyl{X), as required. D 
Although the above proposition allows us to introduce a measure on the semialgebra of 
elementary cylinder set, this semialgebra is "too small" a class of subsets of X to support 
the full machinery of probability theory. For this we need a cr-algebra, that is, a class of 
subsets of X that is closed under the complement and countable unions of its members. Such 
o"-algebra can be defined as an "extension" of Cyl{X). The smallest cr-algebra containing 
Cyl{X) will be called a -algebra generated by Cyl{X). As it turns out, it is possible to extend 
a measure on semi-algebra to the a-algebra generated by it, as the following theorem attests. 



Theorem 2.1 (Hahn-Kolmogorov) Let yU : P — )■ [0, oo] be a measure on semi-algebra V 
of subsets of a set Y . Then fi can be extended to a measure on the a-algebra generated by T). 



This classic result has been first proved by M. Frechet [9], and later by A. Kolmogorov 
[To] and H. Hahn [11]. One can find its contemporary proof in ref. [12]. The proof is based 
on construction of the outer measure fi* determined by /i, and then applying Caratheodory's 
extension theorem. Since the proof bears little relevance to our subsequent considerations, 
it will be omitted here. 

One can also show that the extension is unique if /i satisfies additional conditions. With- 
out discussing this issue in full generality, we will only state that for probabilistic measures, 
that is, measures satisfying n{X) = 1, the extension is always unique [1^. In all subsequent 
considerations, we will assume that the measure is probabilistic, and the set of all probabilis- 
tic measures on the u-algebra generated by elementary cylinder sets of X will be denoted by 

m{x). 

The Hahn-Kolmogorov Theorem coupled with Proposition 12.21 results in the following 
corollary, which summarizes our discussion. 

Corollary 2.1 Any finitely additive map fi : Cyl{X) — )■ [0, 1] satisfying fi{0) = and 
/x(X) = 1 extends uniquely to a measure on the a-algebra generated by elementary cylinder 
sets of X . 

The last thing we need to do is to characterize finite additivity of maps on Cyl{X) in 
somewhat simpler terms. Recall that fi : Cyl{X) — )■ [0, 1] is finitely additive if for every 
B G Cyl{X) and pairwise disjoint Ai G Cyl{X), i = 1,2, ... ,771 such that B = \Ji^i Ai, we 
have /i(-B) = Y^iLi t^i^i)- If -B is a cylinder set, when could it be a union of a finite number 
of other cylinder sets, pairwise disjoint? From the definition of the cylinder set, it is clear 
that if i? is a finite union of Ai, then each Ai must be longer than B, and for each pair 
{B, Ai) all fixed indices of B must be matching. For B = [b].j this can happen in one of the 
following three situations: 

[b]. = U [ba]„ (7) 

[b], = U [ab],_,, (8) 

[b],= U [bac],. (9) 

This means that we attach to b a postfix word, a prefix word, or both, and take the union 
over all values of attached word(s). Note that all cylinder sets on the right hand side of 
each of the above equations are pairwise disjoint. If we want to test the map for countable 
additivity, it is thus sufficient to test it on cases described by equations (jS-E]). 

Proposition 2.3 The map fi : Cyl{X) — )■ [0,1] is countably additive if and only if for all 
[b], G CyliX)\X, 

/.([b],) = J2^i{[ha],) = 5^/i([ab],_i). (10) 

a£A aeA 

Proof: Suppose that the map is countably additive. Applying additivity condition to eq. (^^ 
and ([8]) with a = a yields the desired result. 



Now suppose that the double equahty (fTOl) holds. Applying it recursively k times we 
obtain 

/^([b]») = 5^ • • • 5^ ^([baioa . . . ak\i) = ^ ^([ba]i), (11) 

aieA af^eA aeA'' 

which implies additivity of /i for the case covered by eq. (I7l) . One can deal with cases covered 
by eqs. (IHl) and ([9]) in a similar fashion. The map fj, is thus countably additive on Cyl{X). 
D 

Note that when b = 0, according to our convention, [b]j = X, and eq. flTOj) reduces to 

^/i(H.) = l, (12) 

aGA 

where we used the assumption that the measure is probabilistic, fi{X) = 1. 

3. Shift-invariant measure 

In the previous section we demonstrated that any map /i : Cyl{X) — > [0, 1] satisfying /i(0) = 
0, /i(X) = 1 and conditions of eq. (ITO|) extends uniquely to a measure on the a- algebra 
generated by elementary cylinder sets of X. We will now impose another condition on the 
map /i : Cyl{X) -^ [0,1], namely translational invariance (also called shift-invariance), by 
requiring that, for all b G A*, /i([b]j) is independent of i. To simplify notation, we then 
define P : A* ^ [0, 1] as 

P(b):=M[b].). (13) 

Values P{h) will be called block probabilities. Applying Proposition l2.3l and Hahn-Kolmogorov 
theorem to the case of shift-invariant /i we obtain the following result. 

Theorem 3.1 Let P : A* — )■ [0, 1] satisfy the conditions 

P(b) = J2 ^(ba) = Yl ^(^^) ^^ ^ ^*' ^^^) 

aeA a€g 

l = J2Pia). (15) 



Then P uniquely determines shift-invariant probability measure on the cr-algebra generated 
by elementary cylinder sets of X. 

The set of shift-invariant probability measures on the a-algebra generated by elementary 
cylinder sets of X will be denoted by 9}to-(X). Conditions (fl^ and f lT5|) are often called 
consistency conditions. It should be stressed, however, they they are essentially equivalent 
to measure additivity conditions. Nevertheless, since the term "consistency conditions" is 
prevalent in the literature, we will use it in the subsequent considerations. 

Since P uniquely determines the probability measure, we can use block probability values 
to define shift-invariant probability measure. Obviously, because of consistency conditions, 
block probabilities are not independent. 



We will define P^*^-* to be the column vector of all probabilities of blocks of length k 
arranged in lexical order. For example, for ^ = {0, 1}, these are 

P« = [P(0),P(l)f, 

P(2) = [P(00),P(01),P(10),P(11)]^, 

p(3) = [P(OOO), P(OOl), P(OIO), P(Oll), P(IOO), P(lOl), P(llO), P(lll)]^, 



Using this notation, eq. flMJ) can be written as 

p(fc-i) ^ j^(fc)p(fc) ^ L(fc)p(fc) 



(16) 



where k > 1 and where L^'^^ and R'^'^^ are binary matrices with N''~^ rows and N'' columns. 
In order to describe structure of these matrices, let us denote identity matrix A^'^"^ x A^'^"^ 
by I, and let Jm be a A^'^"^ x A^'^"^ matrix in which ?72-th row consist of all I's, and all other 
entries are 0. Then L''*'^ and R*-^^ can be written as 

L^'^) = [I_^^], (17) 

N 

R('=) = [JiJ2...J^]. (18) 

For example, for A^ = 3, we have 

p(2) = [P(00), P(01), P(02), P(10), P(ll), P(12), P(20), P(21), P(22)]^, (19) 

P(1) = [P(0),P(1),P(2)]^ (20) 

and eq. 016p for k = 2 becomes 



p(i) 



1 1 1 I 1 
1 1 1 1 1 
I I 1 1 1 



p(2) 



1 1 1 1 1 
1 1 1 1 1 
1 I 1 I 1 



P(2). (21) 



Dashed vertical lines illustrate partitioning of matrices R^^-* and L'^^) into blocks of I and J 
type. 

We can now make two remarks about matrices R'^'^^ and L'^'^^ First of all, using eq. fITB]) 
recursively, we can express every p(™) for m G [1, A;) by P^''\ 



p(m) = I rr L^^M p(^'). 

\j=m+l / 



(22) 



In the above, one could replace all (or only some) L's by R's, and the equation would remain 
valid. 



Secondly, note that both L^^^ and R*^^^ are single row matrices with all A^ entries equal 
to 1. This implies that the product L'^^^'L'^^^ is a single row matrix with all A^^ entries equal 
to 1, and, in general, for any k > 1, 

k 

Y[L^'^ = [1J_^]. (23) 

i=l ]\[k 

Again, one could replace here all (or some) L's by R's, and the equation would remain valid. 
As a consequence of this, normalization condition f lT5|) can be written as L'^^^P'^^^ = 1, or, 
replacing P^'^^ by L'-^^P'^^^ as L'^-'^^L'^^^P*^^^ = 1, etc. In general, we can write the normalization 
condition in the form 



■QlW j p(fe) = l^ (24) 



,i=l 



which, of course, is equivalent to 



5^pf) = l. (25) 

Naturally, this was to be expected, since it is a consequence of measure additivity and the 
fact that 

U [b]^ = X. (26) 

After making the above remarks about consistency conditions and their matrix form, let 
us turn our attention to the following problem. In order to fully describe a shift-invariant 
probability measure one needs to know all block probabilities P*^*^ with i = 1,2,..., and 
make sure that they satisfy consistency conditions. In practical applications, however, it is 
often impossible to know all block probabilities, and instead one considers only truncated 
sequence of block probabilities P^*^ for i = 1,2, ... ,k. It is then important to know how 
many of these are truly independent? The next proposition answers this question. 

Proposition 3.1 Among all block probabilities constituting components ofP^^\ P'-^\ . . . , P^'^^ 
only {N — 1)N^~^ are linearly independent. 

Proof: Let us first note that vector P^*^ has A^* components. Collectively, in P'^^^ P^^\ . . . , P'^'^^ 
we have, therefore, X]i"=i ^* ~ (A^'^'*'^ — N)/{N — 1) block probabilities. However, since all 
P '-*•', i G [l,k), can be expressed in terms of P^*^^ with the help of eq. (1221) . we can treat all of 
p(i)^ p(2)^ _ _ _ ^ p(fc-i) as dependent. This leaves us with P^'""^ with A^'^ components. However, 
we also have 

j^(fc)p(fc) ^ j^('=)p('=)_ (27) 

Matrices in the above have A^'^~^ rows, thus we have A^'^"^ equations for A^''" variables. Are 
they all these equations independent? Both L and R have the property that sum of each 
of their columns is 1. Thus if we add all equations of f l27|) . we obtain identity ^ P^^) = 
Y^ p(fc)^ meaning that the number of independent equations in eq. fl27|) is A^'"""^ — 1. All of 
this takes care of consistency conditions ( TT4|) . but we also need to consider normalization 
condition ( TT5|l which, as remarked earlier, can be written in equivalent form as equation 



involving components of P'^'^^ that is, eq. (l25l) . This additional equation increases our 
previously obtained number of independent equations back to N'^"^. In the end, the number 
of independent block probabilities, equal to number of variables minus number of independent 
equations, is N^ - N'''^ = {N - 1)N''-^. D 

Once we know how many independent block probabilities are there, we can express the 
remaining block probabilities in terms of them. We need to choose which block probabilities 
we declare to be independent. The following proposition describes a natural choice. Before 
we state it, we need to introduce some additional notation. As explained in the proof of 
Proposition 13. 1[ in the system of equations R(*^)p('=) = L('=)p('^) only A^'"'"^ — 1 equations are 
independent. We can, therefore, remove one of them, for example, the last equation, and 
replace it by normalization condition ^P^'"') = 1. This will result in 



y[{k)pik) 







1 



:28) 



where the matrix M^'"'-' has been obtained from R^'^^ — L*-'^-' by setting every entry in the last 
row of R*^'"''' — L^^) to 1. Let us now partition M*^'^) into two submatrices, so that the first 
jY^ — A^'^"^ columns of it are called A'^'^^ and the remaining A^'^~^ columns are called B*^'^\ 
so that 

MW = [A^'^^B^*^)]. (29) 

If we recall definitions of L^^^ and R*^*^) in eqs. (fT7|) and ( TTSj) . we can easily verify that 



bw 



-1 
-1 







1 1 1 



(30) 



so that B^'""-' can be constructed from zero A^'"""^ x N''~^ matrix by placing — I's on the 
diagonal, and then filling the last row with I's. The structure of matrix A^^^ is a bit more 
complicated, 

A(^) = [J1J2 . . . Jiv-i] + [ B(^) bW...bW ], (31) 

7V-1 

where, as already defined, 3m. is an A^'^"^ x A^'^"^ matrix in which ?7i-th row consist of all I's, 
and all other entries are equal to 0. 

Proposition 3.2 Let P^^'^ be partitioned into two suhvectors, P*^^-* = {Prp'^, P^ 



Top' 



Bot, 



, where 
en 



Pj.' contains first N'^ — N^ ^ entries ofP^''\ andP^l^ the remaining N^ ^ entries. Th 



(bW)"'aWpS^. (32) 



>(fc) 

Bot 





1 



Proof: we want to solve 



rA^BWl 



Top 

>(fc) 
Bot 





1 



(33) 



for PBof Denoting the vector on the right hand side by c and performing block multiplication 
we obtain A'^^^P^j + B'^'^^P^^^ = c. The matrix B'^'^^ is always invertible, and has the 
property {B^^'^y^c = c. This leads to pg]^ = c - (B^*^))"^ A^^^pj^p, as desired. D 

Corollary 3.1 Among block probabilities constituting components ofP^^^ , P^^\ . . . , P^'^^ we 
can treat first N'^ — N^~^ entries of P*-^^ as independent variables. Remaining components 
^yp(fc) g^j^ ^g obtained by using eq. ^3E) . while P^^\P^'^\ . . . ^ p^^-i) can be obtained by eq. 

m 



Representation of all blocks P^^\ P^'^\ . . ., P^^^ by first N^-N^~^ entries of P^'') will be called 
long block representation. As an example of this, let us consider the case of ^ = {0, 1, 2} 
{N = 3) and P^^), P^^), p(3). We have 3^ - 3^ = 18 independent block probabilities, all of 
length 3. These are 

{P(OOO), P(OOl), P(002), P(OIO), P(Oll), P(012), P(020), P(021), 

P(022), P(IOO), P(lOl), P(102), P(llO), P(lll), P(112), P(120), 

P(121),P(122)}. 

Remaining 21 block probabilities, expressible in terms of the above, are 

{P(200), P(201), P(202), P(210), P(211), P(212), P(220), P(221), 

P(222), P(00), P(01), P(02), P(10), P(ll), P(12), P(20), P(21), 

P(22),P(0),P(1),P(2)}. 



Since there are there are total ^j=i -/V* = (iV^"*"^ — N)/{N — 1) block probabilities i 



m 



p(i) p(2)^ ^p(fc)^ the fraction of independent block probabilities among all block prob- 
abilities up to length k is 



Ind(Ar, k) :-- 



{N - 1){N'' - N''-^) 



For fixed A^, Ind(A^, k) decreases as a function of k, and tends to the limit 



(34) 



lim Ind(iV, k) 

k—^oo 



m 



(35) 



The above reaches minimum 1/4 at A^ = 2, thus Ind(A^, A;) > 1/4 for all A; > 1, A^ > 1. 
This means that the long block expression is most "economical" for the binary alphabet. For 



10 



example, for A^ = 2 and k = 3, among p(i)^ p(2)^ p(3) ^^ have only 4 independent blocks, 
P(OOO), P(OOl), P(OIO) and P(Oll). Remaining 10 probabilities can be expressed as follows. 



P(IOO) 
P(lOl) 
P(llO) 
P(lll) 

P(00) 
P(01) 
P(10) 

P(ll) 

"P(0) 

P(l) 



P(OOl) 

-P(001) + P(010) + P(011) 

P(Oll) 

1 - P(OOO) - P(OOl) - 2P(010) - 3P(011) 

P(000) + P(001) 

P(010) + P(011) 

P(010) + P(011) 

1 - P(OOO) - P(OOl) - 2P(010) - 2P(011) 

P(OOO) + P(OOl) + P(OIO) + P(Oll) 
1 - P(OOO) - P(OOl) - P(OIO) - P(Oll) 



(36) 



Of course, the long block representation is not the only one possible. We will describe 
below yet another representation, which is in some sense complementary to the the long 
block one. It declares as independent blocks of shortest possible length, thus it will be called 
short block representation. 

It is constructed as follows. We start, as before, with block probabilities P'^^^ P^^\ . . . , P^''\ 
and we arrange each of the vectors P*^*) in a vertical column. Example of this is shown in Fig- 
ure [1] In each vector P'^*\ we put aside last A^*~^ entries, and in what remains, we underline 
every A^-th entry, starting from the top. Entries which are still left are framed (cf. Figure 
[T]), and those we declare to be independent. It is straightforward to verify that we have 
jyfc — N^~^ independent entries, as we should. Now how do we express dependent entries in 
terms of independent ones? In each vector, starting from the left, we replace each underlined 
entry by a linear combinations of boxed entries from the same column and (possibly) entries 
from the column on the left hand side, by following the path which starts with i — y arrow 
and which ends at the underlined entry in question. For example, for P(02), such path is 
P(0) I — > P(00) — >■ P(01) —7- P(02). Labels above arrows indicate how the equation is to be 
constructed, in this case 

P(0)-P(00)-P(01) = P(02). (37) 

All arrows are labeled with "— ", except those which point toward underlined entries, which 
are labeled with "=". 

Once we are done with all underlined entries in a given vector, we express all entries 

We then move to the next vector on the right 



>W 



(fc) 



marked as P^^j by Prp^ , using eq. 
and repeat the same procedure, until all vectors are dealt with. By inspecting Figure [1], the 
reader can verify that the short block representation utilizes short blocks as much possible, 
and that, in fact, it is not possible to declare a larger number of short blocks as independent. 



In order to describe the above algorithm in a more formal way, let us define vector of 
admissible entries for short block representation, P)j^(„, as follows. Let us take vector P^^^^ in 
which block probabilities are arranged in lexicographical order, indexed by an index i which 
runs from 1 to A^'^. Vector Pjj^^ consists of all entries of P *■'"''' for which the index i is not 
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k = l 



k = 2 



P(0) I > P(00) h 



P(2) 



P(01) h 



P(02) h 



P(l) I > P(10) h 



P(ll) ^ 



P(12)h 



P(20) 



P(21) 



P(22) 



P(OOO) 



P(OOl) 



P(002) 






P(OIO) 



P(Oll) 



P(012) 






P(020) 



P(021) 



P(022) 






P(IOO) 



P(lOl) 



P(102) 






P(llO) 



P(lll) 



P(112) 






P(120) 



P(121) 



P(122) 
P(200) 
P(201) 
P(202) 
P(210) 
P(211) 
P(212) 
P(220) 
P(221) 
P(222) 






jyk _ j^k^i entries 

^Top 



N entries 



,{fc) 

Bot 



bW 



(fc)p(fc) 



AWp 



Top 



Figure 1: Generation of short block representation for A^ = 3 and P^'^^ for k = 1,2, 3. 
Independent block probabilities are boxed, while dependent block probabilities obtained 
from probabilities of shorter blocks are underlined. 
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divisible by A^ and for which i < N^ — N^ ^. For example, for A^ = 3 and k = 2 we have 

p(2) = [P(00), P(01), P(02), P(10), P(ll), P(12), P(20), P(21), P(22)]^, 

and we need to select entries with i not divisible by 3 and i < 6, which leaves i = 1, 2, 4, 5, 
hence 

Pil = [nOO),P(01),P(10),P(ll)f. 

Vector of independent block probabilities in short block representation is now defined as 



short 



p(l) 
adm 

p(2) 
adm 



>(fc) 
adm 



(38) 



For A^ = 3 and k = 2, elements of P^/jo^^ are shown in Figure 1 in red color. Note that 
the length of P^^ort ^^ ^^^ same as P^^jp- We can, therefore, transform one into the other 
by a linear transformation. The form of this transformation can be deduced from Figure 1. 
Consider, for example. A; = 2, so that Pj?]^ = [P(00), P(01), P(02), P(10), P(ll), P(12)]^ 



and P 

read 



(2) 
adm 



[P(00),P(01),P(10),P(ll)r,P 



(1) 

Bot 



adm 



[P(0), P(l)]^. From Figure 1, we 



P(00) = P(00), 

P(01) = P(01), 

P(02) = P(0) - P(00) - P(01), 

P(10) = P(00), 

P(11) = P(01), 

P(12) = P(1)-P(10)-P(11), 



(39) 



where, if an element P(b) of P^-op was admissible, we wrote P(b) = -P(b), and if it was 
admissible, we expressed it in terms of probabilities of shorter blocks. The above can be 
written as 



>{2) 
Top 



'00' 









1 



p{l) , 

^adm ^ 







1 





1 














1 








1 


-1 














1 














1 








-1 


-1 



){2) 



>(1) 



,(2) 



>(2) 
adm' 



(2) 



(40) 



This expresses Prp' in terms of P)^J^ and P^j^^, that is, in terms of P^^^^.^. One can similarly 
show that for general k > 1, 



,(fc) 

Top 






Top 



admi 



(41) 
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where 



C^'') = diag( ejv,ejv,...,ejv ), Bn 






>N 



V, 1^ / 





1 



-1,-1,. ..,-1 



(42) 



(43) 



Af-1 



Note that C^'^) has A^'' - N''-^ rows and N''~^ - N^-'^ columns, while DC') has N^ - N^-^ 
rows and [N^~^ — N^~'^){N — 1) columns. Applying eq. (HT]) k — 1 times recursively, one 
obtains, for A; > 2, 



fc-i 



>(fc) 

Top 



(-,(fc)Q(fc-i) _ _ _ Q(2)p(i) +\^c(''^C(^'~^^..C(*+^)D(*)p^'^ +D('')p 



(fc) 

adm' 



i=2 



When /c = 2 no recursion is needed, as eq. fl4T|) becomes 



»(2) 
Top 



adm ^^ adm'' 



and for /c = 1 we simply have 
If we define 



p(i) ^ p(i) ^ p(i) 

Top adm short " 



M 



(fc) 

short 



Q{k)Q{k-l) _ _ _ (-i(2)^(-i(fc)(-.(fc^l)_ Q(i+l)pW^^ J3(fc) 



[C(2),D(2)], 



repeated for i=2...fc— 1 



N-1, 



k>2 

k = 2 
k = l 



then equations (im - [46|) can be written as 



>(fc) 



f(k) T3(fc) 



■^Top ~ ^^^short^ short- 



(44) 



(45) 
(46) 



(47) 



(45 



Proposition 3.3 Among block probabilities constituting components ofP^^\ P^'^\ . . . , P^'^^, 
we can treat entries of Pgii^rt ^^ independent variables. One can express first N'' — N'''^ 
components of P^'^^ by P^^jq^ ^V f^^f^f^s of eq. (2^. Remaining components of P'^'^^ can be 
obtained by using eq. ( Tigj) . while P*^^\ P*^^^ . . . , pC^"^) can be obtained by eq. (E^. 

Let us now apply the procedure described above to the N = 2 and k = 3 case, the same as we 
already considered for long block representation. Among components of p(i)^p(2) a^d p(3) 
we have only four independent block probabilities, P 



(3) 



short 



[P(0),P(00),P(000),P(010)]^, 
and 10 dependent probabilities. We first partition P^^^ into two subvectors, P^-' = [P(OOO), 
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,(3) 



P(OOl), P(OIO), P(Oll)]^ and P^^^^ = [P(IOO), P(lOl), P(llO), P(lll)]^. Eq. (iHD takes the 



form 



>(3) 
Top 



P(OOO) 
P(OOl) 
P(OIO) 
P(Oll) 



M(3) p(3) 
short short 



10 
1-10 
1 
1-10-1 



P(0) 
P(00) 
P(OOO) 
P(OIO) 



(49) 



,(3) 



Components of P^^^ can be obtained from eq. ( !32|) . 



" P(IOO) " 




" " 




P(lOl) 









P(llO) 









P(lll) 




1 













0-100 
1-1-1 
0-1 
112 3 



P(OOO) 
P(OOl) 
P(OIO) 
P(Oll) 



(50) 



and we can use eq. (HH]) again to replace [P(OOO), P(OOl), P(OIO), P(Oll)]'^ on the right hand 
side by M^^^^^P^^^^^. Equations (149 p and (I50p . therefore, yield all components of P'^^\ By 
applying eq. (122]) we can obtain P^^^ and of P*^^^. This will yield the following 10 dependent 

(3) 
shorf: 



blocks probabilities expressed in terms of elements of P 



p(ooi) 1 r P(00) - P(OOO) 

P(Oll) P(0)-P(00)-P(010) 

P(IOO) _ P(00)-P(000) 

P(lOl) ~ P(0) - 2P(00) + P(OOO) 

P(llO) P(0)-P(00)-P(010) 

P(lll) J [ 1-3P(0) + 2P(00) + P(010) 

P(01) 1 r P(0)-P(00) 

P(10) = P(0)-P(00) 

P(ll) J [ 1 - 2P(0) + P(00) 

P(1) = 1-P(0). 



(51) 



We can see that the resulting expressions are shorter than in the case of long block represen- 
tation given by eq. (15^ . Indeed, the short block representation is more "natural" and often 
helps to gain insight into the properties of the probability measure it describes. We will see 
this when this representation is used to simplify local structure theory equations. 

4. Bayesian Extension 

From what we have seen so far, it is clear that the knowledge of P^^^ is enough to determine 
all P*^*-* with i < k. What about i > k? Obviously, since the number of independent 
components in P^*-* is greater than in P'^'^^ for i > k, there is no hope to determine P*-*-* using 
only P'^'^). It is possible, however, to approximate longer block probabilities by shorter block 
probabilities using the idea of Bayesian extension. 
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Suppose now that we want to approximate P{aia2 ■ ■ ■ ctfc+i) by P{aia2 . . . Ofc). One can 
say that by knowing P{aia2 ■ ■ ■ dk) we know how values of individual symbols in a block 
are correlated providing that symbols are not farther apart than k — 1. We do not know, 
however, anything about correlations on the larger length scale. The only thing we can do in 
this situation is to simply neglect these higher length correlations, and assume that if a block 
of length k is extended by adding another symbol to it on the right, then the the conditional 
probability of finding a particular value of that symbol does not significantly depend on the 
left-most symbol, i.e., 

P(aia2 ■ ■ ■ gfc+i) _ P(a2---Qfc+i) ,^^. 

P{ai...ak) P{a2...ak) 

This produces the desired approximation of /c + 1 block probabilities by /c-block and k — 1 
block probabilities, 

P[aia2...ak+i) ^ — , (53) 

P[a2 . . . flfc) 

where we assume that the denominator is positive. If the denominator is zero, then we take 
P{aia2 ■ ■ ■ Cfc+i) = 0. In order to avoid writing separate cases for denominator equal to zero, 
we define "thick bar" fraction as 

-r--=\ (54) 

[o if 6 = 0. 

Note that eq. ( 153|) only makes sense ii k > 1. For k = 1, the approximation is 

P{aia2) ^ P{ai)P{a2). (55) 

Again, in order to avoid writing the k = 1 case separately, we adopt notational convention 
that 

P{am ■ ■ ■ dn) = 1 whenever n > m, (56) 

and then eq. fl53l) remains valid even for k = 1. Using notational conventions given in eq. 
f lMl ) and (156 p and applying our approximation recursively m times we can express k + m 
block probabilities in terms of k and k — 1-block probabilities, 

P{aia2...ak+m) - -• (57) 

1 lj=l -T (Oj+l • • • Oj+fe-l j 

Note that if we want, we can write the right hand side of the above in terms of only /c-block 
probabilities, by substituting in the denominator 



P{ai+i . . . tti+k-i) = ^ P{ai+i . . . tti+k-ib). (5^ 



beA 
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Proposition 4.1 Let fi G Wlcr{X) be a measure with associated block probabilities P : A* -^ 
[0, 1], P(b) = yu([b]i) for allieZ and b G A*. For k>0, define P : A* ^ [0, 1] such that 

{P{aia2 . . .ap) ii p < k, 
nr-r'P(a,...a,+fc_i) ^ (59) 
otherwise. 
IFi=i P{<^i+i ■ ■ ■ Oi+fc-i) 

Then P determines a shift-invariant probability measure Ji^'^^ G '^^{X), to be called Bayesian 
approximation of /i of order k. 

Proof. If b = 6162 ■ ■ -bn, we will denote subblocks of b by b[,j j] = 6j6j+i . . .bj. Using Theorem 
I3.lt all we need to do is to show that conditions (IT^ and (IT^ are satisfied by P. The second 
one holds for P, because it obviously holds for P. For the same reason eq. ( I lip holds for 
block b of length up to /c — 1. For b = 6162 ■ ■ -bp, p > k, we have 

v:^ ~ ,^ , v^ WiZi^^ P{S,i+k-i]) ■ P(b[p„fc+2,p]a) 
> P{ha) = > 

a&A aeA Ili=l -P(b[j+l,j+fc-l]) ■ -P(b[p_fc+2,p]) 



Ili=l -P(b[i+l,i+A;-l]) ■ -P(b[p_A:+2,p]) aeA 

nf=i p(b[i,i+fe-i] 



^ P(b[p„fc+2,p]a) 

P{h[p.k+2,p]) = P{h). (60) 



ni=i -P(b[i+i,j+/c-i]) ■ -P(b[p_fc+2,p] 

One can similarly prove that Xlae^-^('^b) = -P(b). D 

When there exists k such that Bayesian approximation of fi of order k is equal to /x, we 
call /i a Markov measure or a finite block measure of order fc. The space of Markov measures 
of order k will be denoted by dJl^^\X), 

m^''\X) = {i2em^{X):fi = ll^''^}. (61) 

It is often said that the Bayesian approximation "maximizes entropy". In order to state 
this property in a formal way, let us define entropy density of shift-invariant measure fi G 
m^{X) as 

h{fi) = lim -- V P(b)logP(b), (62) 

heA" 

where, as usual, P(b) = yu([b]j) for all i G Z and b G A*. The following two propositions 
and the main ideas behind their proofs are due to M. Fannes and A. Verbeure [S]. 

Proposition 4.2 For any fi G Dyi„{X), the entropy density of the k-th order Bayesian 
approximation of fi is given by 

/^(/IW) = J2 P(a)logP(a) - J2 ^(a)logP(a). (63) 
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Proof: Since we are interested in n — ?> oo limit, let us consider n > k. Then 

b6yl" bGyl" llj=l ^["[i+l,i+k-l]) 

n—k+1 n—k 

= E ^(b) 5^ logP(b[,,+,_i])- 5^ P(b)5^1ogP(b[,+i,+,_i]) (64) 

beA" «=1 bGy4" «=1 

For any i & [l,n — k + 1], 



5^P(b)l0gP(b[,,+fe_i]) = 

/ ^ / ^ / ^ -P(b[i,j_i]b[j^j+fc„i]b[j+fe^„])logP(b[j^j+fc_i]) 



be^" 



"[l,i-l] t)[i,i + fe-l] t)[i + fe,n.] 



= J2 i^(a)logP(a) = J2 ^(a)logP(a), (65) 

and, by the same reasoning, any i G [1, n — /c], 

^P(b)logP(b[,H_i,+fc.i])= J2 na)logP(a). (66) 

Using this, eq. (l64l) becomes 
5^P(b)logP(b) = 

A; + l) ^ P(a)logP(a)-(n-A;) J] P(a)logP(a). (67) 



heA" 

{n 

a£A^ aeA''- 



Dividing this by —n and taking the limit n — > oo we obtain the desired expression. D 

Theorem 4.1 For any /x G ^tj{X) and any k > 0, the entropy density of fi does not exceed 
the entropy density of its k-th order Bayesian approximation, 

h{fi) < hiJl^''^). (68) 

Proof: Let 

Hn{f^) = - 5^P(b)logP(b), (69) 

beyl" 

^n(/I^'^) = -$^P(b)l0gP(b), (70) 

heA" 

(71) 



We will use convexity of /(x) = xlogx, 

xlogx — y\ogy < {x — y){l + \ogx). (72) 

Applying this inequality to S'„(/i) — S^ (/x'-^'' ) for n > A; we obtain 

HM - HniJi^'^) = Yl nb)logP(b)- J2 P{h)\ogP{h) 

heA" heA" 

< Y, (p(b)-p(b))(i+iogP(b)) 

= Y nb)logP(b)- J2 P{h)\ogP{h), (73) 
heA" heA" 

where we used the fact that YlheA" -^(b) = YlheA" -^(b) = 1- Note that we already com- 
puted the value of J2heA" -^(b) logP(b) (cf. eq. [67!). ^l^o note that nothing would change 
in the derivation of eq. (1671) if we were computing ^j^^^^ P(b) logP(b) instead, meaning 
that _ _ _ 

J2 ^(b) logP(b) = J2 ^(b) logP(b). (74) 

heA" heA" 

We therefore obtain 

i7„(/i)-i7„(/I«)<0. (75) 

Dividing this by n and taking the limit n — )■ oo we obtain inequality ( l68l) . D 

Let /i, fin G Wl{X). If J^ fdun — >■ Jx fdfi as ?i — )■ oo for every bounded, continuous real 
function / on X, we say that /i„ converges weakly to /i and write /i„ ^ /i. Proof of the 
following useful criterion of weak convergence, originally due to Kolmogorov and Prohorov 
I, can be found in 



Theorem 4.2 Let U be a subclass of the smallest a-algebra containing all open sets of X 
such that (i) U is closed under the formation of finite intersections and (ii) each open set in 
X is a finite or countable union of elements ofU. If fin{A) — )■ fi{A) for every A eU , then 
/in, =^ At- 

The subclass U satisfying hypothesis of the above theorem is called convergence determining 
class. It is easy to verify that Cyl{X) is a convergence determining class for measures in 
Wl{X), hence the following proposition. 

Proposition 4.3 The sequence ofk-th order Bay esian approximations of fx G DJlcr{X) weakly 
converges to fi as A; — )■ oo. 

Proof: Let ra > 0, b e ^" and let Pfe(b)^ = /i('=)([b]o), P(b) = /i([b]o). Since for A; > n 
Pfc(b) = P(b), we obviously have limfc_j,oo Pfc(b) = P(b). Theorem 14. 2 ^ coupled with the fact 
that Cyl{X) is a convergence determining class leads to the conclusion that /I'^^) ^ yU. D 
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5. Cellular automata 

Let w : A X A^^'^^ — )■ [0, 1], whose values are denoted by w(a|b) for a & A, h ^ A^^~^^, 
satisfying YlaeA'^^^l^) ~ -'-' ^^ called local transition function of radius r, and its values will 
be called local transition probabilities. Probabilistic cellular automaton with local transition 
function w is a map F : OJt(X) — )■ OJt(X) defined as 

(F/i)([b]i) = Y^ w{^\^)KH~r) for alH G Z, b e A\ (76) 



where we define 



w{a.\h) 



_w{aj\bjbj+i...bj+2r)- (77) 

When the function w takes values in the set {0, 1}, the corresponding cellular automaton is 
called deterministic CA. 

For any probabilistic measure /i G DJl{X) , we define the orbit of fi under F as 

{Fy}^=o- (78) 

In general, it is very difficult to compute F^/i directly, and no general method for doing this 
is known. To see the source of the difficulty, let us take A = {0, 1} and let us consider the 
example of rule 14, for which local transitions probabilities are given by 

w;(l|000) = 0, w;(l|001) = 1, w;(l|010) = 1, w;(l|011) = 1, 

m;(1|100) = 0, m;(1|101) = 0, m;(1|110) = 0, m;(1|111) = 0. (79) 

Let us further suppose that we want to compute orbit of a shift-invariant Bernoulli measure 
yUi/2, such that for any block b G A*, /ii/2([b]) = (1/2)' '. If we, for example, consider blocks 
b of length 2, then, defining P„(b) = (F"'/ip)([b]), equation (175]) becomes 

P„+i(00) = P„(0000) + P„(1000) + P„(1100) + P„(1101) + P„(1110) 

+p„(llll), 

P„+i(01) = P„(0001) + P„(1001) + P„(1010) + P„(1011), 
P„+i(10) = P„(0100) + P„(0101) + P„(0110) + P„(0111), 
P„+i(ll) = P„(0010) + P„(0011). (80) 

It is obvious that this system of equations cannot be iterated over n, because on the left hand 
side we have probabilities of blocks of length 2, and on the right hand side - probabilities 
of blocks of length 4. Of course, not all these probabilities are independent, thus it will be 
better to rewrite the above using short form representation. Since among block probabilities 
of length 2 only 2 are independent, we can take only two of the above equations, and express 
all block probabilities occurring in them by their short form representation, using eq. fIST]) . 
This reduces eq. fl5U|) to 

P„+i(0) = l-P„(0) + P„(000), 
P„+i(00) = 1 - 2P„(0) + P„(00) + P„(000). (81) 
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Although much simpler, the above system of equations still cannot be iterated, because 
on the right hand side we have an extra variable P„(000). To put it differently, one cannot 
reduce iterations of F to iterations of a finite-dimensional map (in this case, two-dimensional 
map). 

Before we continue, let us remark that although the aforementioned reduction is not, in 
general, possible, one can, nevertheless, in some circumstances compute (F"yu)([b]) for some 
selected (typically short) blocks b and for some reasonably simple yU. Such calculations use 
entirely different approach, and typically they exploit features of a particular CA rule, thus 
they cannot be easily generalized. For example, when /z is a Bernoulli measure, probabilities 
of blocks of length up to 3 have been computed for a number of binary cellular automata 
rules, using the method of preimage counting [T5| [T6| [T7t [T8| [T9] . We will, however, not be 
concerned with these methods here. Instead, we will now turn our attention to approximate 
methods for computing F"/i. 

Since the reduction of iterations of F to iterations of finitely-dimensional map is, in 
general, impossible, we can try to perform this task in an approximate fashion. In the case 
of rule 14 discussed above, we could use use Bayesian approximation for this purpose, 

P„(00)P„(00) 



Pn(OO)^ 
Pn{0) ' 



Equations (IHTjl would then become 

P,,+i(0) = l-P„(0) 



P„+i(00) = 1 - 2P„,(0) + P„(00) + — ^. (83) 

The above is a formula for recursive iteration of a two-dimensional map, thus one could 
compute Pn{0) and P„(00) for consecutive n = 1,2... without referring to any other block 
probabilities, in stark contrast with eq. 0811) . This, in fact, is the main idea behind the local 
structure approximation which will be formally introduced in the next section. 

6. Approximate orbits of measures 

Given the difficulty of finding F'^-fi, H. Gutowitz et. al. jl] |20] developed a method of 
approximating orbits of F, known as the local structure theory. 

Following [T|, let us define the scramble operator of order k, denoted by S'^'^\ to be a map 
from 9Jlo-(X), the set of shift-invariant measures on X, to the set of finite block measures of 
order k, such that 

S('=V = /^^'^- (84) 

The sequence 

'(sWpHW)Vr (85) 

n=0 
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will be called the local structure approximation of level k of the exact orbit {F^/ijJ^Q. 
Note that all terms of this sequence are Markov measures, thus the entire local structure 
approximation of the orbit lies in Dyi^''\X). 

The main hypothesis of the local structure theory is that eq. (l85l) approximates the 
actual orbit {-F"/i}J^Q increasingly well as k increases. The meaning of "approximates" is 
not rigorously defined in the original paper of H. Gutowitz et. al. [1]. We will shortly prove 
that every point of the approximate orbit weakly converges to the corresponding point of 
the exact orbit as A; — )■ oo. To do this, we need the following useful result. 

Proposition 6.1 Let k be a positive integer and h G A*. If k > |b| + 2r, then 

To prove it, note that /i([a]) = /I^'""-' ( [a] ) for all blocks a of length up to k. The first equality 
of the proposition can be written as 

^ «;(a|b)/.([a]) = Yl ^(a|b)/l('=)([a]). (86) 

The equality holds when |a| < k, that is, \h\ +2r < k. 

The second equality is a result of the fact that the scramble operator only modifies 
probabilities of blocks of length greater than k. Since A; > |b| + 2r, we have |b| < k and 
therefore F/i([b]) = S('=)F/i([b]). D 

Since F" can be considered as a cellular automaton rule of radius nr, when /c > |b| +2nr 
we have F'"/i([b]) = F'"H'^'^)/i([b]). We can insert as many S^'^^ on the right hand side 
anywhere we want, and nothing will change, because S*^*^) does not modify relevant block 
probabilities. This yields an immediate corollary. 

Corollary 6.1 Let k and n be positive integers and h G A*. If k > |b| + 2nr, then 

F>([b]) = (H('=)FS(^))"/i([b]). 

This means that for a given n, measures of cylinder sets in the approximate measure 
^•^(k) p'^^k)^ ^ converge to measures of cylinder sets in F"/^. By the virtue of Theorem 
2] we thus obtain the following result. 



Theorem 6.1 Let F be a cellular automaton and ^ be a shift- invariant measure in '*Mrj{X). 
Let Un be a local structure approximation of level k of the measure F^jj,, i.e., Un = 
^^ik)p'^ik)^ ^. Then for any positive integer n, Un ^ F"n as k —^ oo. 

7. Local structure maps 

A nice feature of Markov maps is that they can be entirely described by specifying prob- 
abilities of a finite number of blocks. This makes construction of finite-dimensional maps 
generating approximate orbits possible. 
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(k) 



li Un = (S'-'^''FS^'^)) /i, then Un satisfies recurrence equation 



,,W -^ik)p^(k) (k) 



i7) 



On both sides of this equation we have measures in VJl^^^X), and these are completely 
determined by probabilities of blocks of length k. If |b| = k, we obtain 

^Si([b]) = H('=)FHWz.W([b]), (88) 

and, since S'-'^^ does not modify probabilities of blocks of length k, this simplifies to 



By the definition of F, 



^ri([b]) = Fs('^)^f([b]). 



^Si([b])= E i.(a|b) (H^z/f ) ([a]), 

ag^|b|+2r 



(89) 



(90) 



and, by the definition of Bayesian approximation 



ae^|b|+2r 






(91) 



To simplify the notation, let us define Qnic) = z/„ ([c]). Then, using consistency conditions 
in order to obtain on the right hand side expression involving only probabilities of blocks of 
length k, we rewrite the previous equation to take the form 



r2r+l 



Hill Qn{a-[i,i+k~l]] 



g„+i(b) = J2 ^(^Ib)- 2. ^ n f ^ 



(92) 



The above equation can be written separately for all b G A^. If we arrange (5„(b) for all 
b G A''' in lexicographical order to form a vector Q„, we will obtain 



Q 



n+l 



L^'^ (Q. 



(93) 



where L^''^ : [0, l]'-^' — ?■ [0, l]'-^' has components defined by eq. 092p . L^'^^ will be called local 
structure map of level k. First N'^ — N'^"^ components of L^^^ will be denoted by L^j , and 

(k) 

the remaining components by Ln', and therefore the local structure map can be written as 



Q 
Q 



(fc) 

Top 

(k) 

Bot 



I > 



-{k) 



(k) ^(fc) 






(k) 



(fc) ^{k) 



^Bot \ ^Top^ ^Bot 



(94) 



vW 



(k) 



By Proposition 13.21 Q^^j can be expressed in terms of Qj-op 



Q 



(fc) 

Bot 



[0,...,0,lf-(B(^))"^A('=)QS,, 
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(95) 



and, therefore, only the "top" component of our map needs to be considered, 

QZ ^ 41 {q'tI [0, . . . , 0, 1]^ - (B('=))-^ A('=)qS,) . (96) 

We will call the above map the reduced long form of the local structure map, and write it as 

^Top ' ^ ^red. lonn [ ^Top I ■ V^') 



iTop red. long \ ^Top 

Now using eq. f HHj) . we have Qj^jp = ^short^shorv ^^^ ^^ "^^^ change variables in eq. 
(196|) from long to short block representation. This yields 

Qshort ' ^ y^short) ^red.long [^shortQ'Top j ■ \^°) 

We will call the above map the reduced short form of local structure map, and write is as 

^short ^ ^red. short y^shortj • W^7 

As an example, consider rule 184 given by 

w;(l|000) = 0, m;(1|001) = 0, m;(1|010) = 0, w(l|011) = 1, 

w(l|100) = 1, m;(1|101) = 1, w(l|110) = 0, m;(1|111) = 1, (100) 

and suppose we wish to construct local structure map of level 2 for this rule. Let P„(b) = 
F"'fi{[h]). Using eq. (176|) we obtain for r = 1, |6| = 3 



P„+i(b) = J2 ^(a|b)P„(a). (101) 



Using definition of w(a|b) given in eq. (I77p and transition probabilities given in eq. (llOOp 
we obtain 

P„+i(00) = P„(0000) + P„(0001) + P„(0010), 

P„+i(01) = P„(0011) + P„(0100) + P„(0101) + P„(1100) + P„(1101), 

P„+i(10) = P„(0110) + P„(1000) + P„(1001) + P„(1010) + P„(1110), 

P„+i(ll) = P„(0111) + P„(1011) + P„(llll). (102) 

This set of equations describes exact relationship between block probabilities at step n + 1 
and block probabilities at step n. Note that 3-block probabilities at step n + 1 are given in 
terms of 5-blocks probabilities at step n, thus it is not possible to iterate these equations. 
Local structure map of order 3, given by eq. (p2|) . becomes 

n .nn^ ^"^°0)' Q„(00)'Q„(01) 



(Q„(00) + Q„(01))' (Q„(00) + g„,(oi))' 



Qn(00)gn(01)Qn(10) 

^ (Q„(00) + Q„(01)) (g„(10) + Q„(ll)) 
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g„(oo)g,,(oi)g„(ii 



(gn(oo) + g„(oi)) (g„,(io) + g„(ii)) 

gn(oo)g„(oi)g„(io) 



+ 



(gn(oo) + g„(oi)) (g„(io) + g„(ii)) 

gn(oi)'gn(io) 

(gn(oo) + g„(oi)) (g„(io) + g„(ii)) 
g„(ii)g„(io)g„(oo) 

(gn(oo) + g„(oi)) (g„(io) + g„(ii)) 

g„(oi)g„(ii)g,(io) 



(g„(oo) + g„(oi)) (g„(io) + g„(ii)) 
_ g„(oi)g„(ii)g„(io) g,,(io)gn(oo)' 

(gn(lO) + gn(ll))' (gn(OO) + Qn{01)f 

gn(oo)g„(oi)g„(io) g„(oi)g„(io)' 



(gn(oo) + g„(oi))2 (gn(oo) + g„(oi)) (g„(io) + g„(ii)) 

g,,(ii)'g„(io) 



(g„(io) + g„(ii))'' 



g„.(oi)g„(ii)^ 
gn+uiij — , ^ — , , 3 



(g„(io) + g„(ii)r 

g„(oi)g„(ii)g„(io) g„(ii: 



. (103) 

(gn(oo) + g„(oi)) (g„(io) + g„(ii)) (g„(io) + g„(ii))' 

Note that eq. (I103p can be obtained from eq. ( I102p by replacing P's by g's and expressing 
every 5-block probability by its Bayesian approximation of order 3. 

According to Corollary 13. ![ only first two components of [g„(00), g„,(01), g„(10), g„(ll)] 
are independent, that is, g„(00), g„(01). This means that we can ignore last two equations 
in fllU3p . making in the first two equations substitutions given by eq. (152]) . that is, 

gn(io) = g„(oi), (104) 

g„(ii) = i-g„(oo)-2g„(oi). (io5) 

This yields reduced long form of the local structure map (cf. eq. [96]) . 

^ ,^^, _ Qnioof g„(oo)g„(oi)' 



gn(oo) + g„(oi) (g„(oo) + g„(oi)) (-g,,(oi) + 1 - g„(oo)) ' 

g„+i(oi) = g„(oi) 

(2 g„(oo)' - 2 g„(oo) + 4 g„(oo)g„,(oi) + g„(oi)' - g„(oi)) 

■ . (106) 

(gn(oo) + g„(oi)) (g„(oi) - 1 + g„(oo)) 
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We now proceed to produce reduced short form of this map. In short block representation, 
we choose Q„(0) and Qn(OO) as independent blocks, and probabilities of all other blocks of 
length 2 can be expressed by them, 

Q.(oi) = g„(o)-Q„(oo), 
Qn(io) = g„(o)-g„(oo), 
g„(ii) = g„(oo)-2g„(o) + i. 

With this change of variables, eq. fll06p becomes 

g„+i(o) = g„(o), 

Qn{o) gn(o)[i-g/-^^ 



This is the reduced short form of the local structure map (cf. eq. [98]) . Note that this form is 
not only simpler than the original local structure map, but it also makes it easier to see an 
important property of the map, namely the fact that the probability of is invariant. This 
actually is true for the orbit of rule 184: probability of (and 1) stays the same along the 
orbit. We have here, therefore, an example of a case where the local structure map "inherits" 
a property of the rule it approximates. In this case, it inherits the so-called additive invariant. 
Not only does the map inherit the invariant from the exact orbit of rule 184, but it 
also converges to the "right" value. One can easily find its fixed points, determine their 
stability, and from there determine lim„_j.oo gn (00). Since gn(0) is constant, let us denote 
gn(0) = 1 — p, so that gn(l) = p. Eqs. (I107p then reduces to 

n fnn^ ^"(0°)' , QnmiQnjl - P - QujOO)? ,,^,, 

Vn+llOOj = — \ . (lU8j 

1-p p(l-p) 

This nonlinear difference equation has three fixed points, 0, 1 — p and 1 — 2p. The second one, 
1 — p, is always unstable. The first one, 0, is unstable for p < 1/2, and stable for p > 1/2. 
The third one, 1 — 2p, is is stable for p < 1/2, and unstable for p > 1/2. We can, therefore, 
write 

li,nQ„(00) = (;-^''- "^Jf, (109) 

"^oo I 0, p > 1/2. 

Remarkably, this agrees with the exact limiting value of P„(00) = F"/i([00]) for rule 184 
provided that /i is a Bernoulli measure, as computed in |19| . Again, we can say that the 
local structure map in this case inherits the limiting value of the probability P„(00) from the 
exact orbit. 

8. Conclusions 

We have formalized the idea of local structure theory and demonstrated that orbits of shift- 
invariant measures under probabilistic (or deterministic) CA can be approximated by orbits 
of (A^ — 1)A^ '^-dimensional maps, called reduced local structure maps. The paper presented 
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detailed procedure for construction of such maps. After this foundation has been laid out, 
further research is clearly needed. The main question which remains is the relationship 
between orbits of reduced local structure maps and exact orbits. Theorem 16.11 reveals one 
such relationship, namely that points of the orbits of the local structure map weakly converge 
to corresponding points of the exact orbit. Much more, however, can be said. For example, 
as we already noticed in the case of rule 184, local structure map "inherits" an additive 
invariant form the exact orbit. One can prove that this is a general property which holds for 
arbitrary CA rule with additive invariant (s). Are other important properties of CA, such as, 
for example, nilpotency or equicontinuity, "inherited" in a similar fashion? Can we rigorously 
prove that certain features of exact orbits are preserved when exact orbits are replaced by 
local structure approximated orbits? What are these features? These questions are currently 
under investigation and will be reported in a follow-up paper. 

9. Acknowledgements 

The author acknowledges partial financial support from the Natural Sciences and Engineering 
Research Council of Canada (NSERC) in the form of Discovery Grant. Some calculations on 
which this work is based were made possible by the facilities of the Shared Hierarchical Aca- 
demic Research Computing Network (SHARCNET:www.sharcnet.ca) and Compute/Calcul 
Canada. 

References 



[1 
[2 
[3 
[4 
[5 
[6 

[7: 



H. A. Gutowitz, J. D. Victor, and B. W. Knight, "Local structure theory for cellular 
automata," Physica D 28 (1987) 18-48. 

P. Kurka and A. Maass, "Limit sets of cellular automata associated to probability 
measures," Journal of Statistical Physics 100 (2000) 1031-1047. 

P. Kurka, "On the measure attractor of a cellular automaton," Discrete and 
Continuous Dynamical Systems (2005) 524 - 535. 

M. Pivato, "Ergodic theory of cellular automata," in Encyclopedia of Complexity and 
System Science, R. A. Meyers, ed. Springer, 2009. 

E. Formenti and P. Kurka, "Dynamics of cellular automata in non-compact spaces," in 
Encyclopedia of Complexity and System Science, R. A. Meyers, ed. Springer, 2009. 

S. Wolfram, "Statistical mechanics of cellular automata," Reviews of Modern Physics 
55 (1983), no. 3, 601-644. 

H. J. Brascamp, "Equilibrium states for a one dimensional lattice gas," 
Communications In Mathematical Physics 21 (1971), no. 1, 56. 

M. Fannes and A. Verbeure, "On solvable models in classical lattice systems," 
Commun. Math. Phys. 96 (1984) 115-124. 

27 



[9] M. Frechet, "Sur I'integrale d'une fonctionnelle etendue a un ensemble abstrait," 
Bulletin de la S. M. F. 43 (1915) 248-265. 

[10] A. Kolmogorov, Grundbegrijje der Wahrscheinlichkeitsrechnung. Springer- Verlag, 
Berlin, 1933. 

[11] H. Hahn, "Uber die Mutiplikation total-additiver Mengenfunktionen," Annali della 
Scuola Normale Superiore di Pisa 2 (1933), no. 4, 429-452. 

[12] K. R. Parthasarathy, Introduction to Probability and Measure. Springer- Verlag, New 
York, 1977. 

[13] A. N. Kolmogorov and Y. V. Prohorov, "Zufallige funktionen und 

grenzverteilungssatze," in Bericht iiber die Tagung Wahrscheinlichkeitsrechnung und 
Mathematische Statistik, pp. 113-126. Deutscher Verlag der Wissenschaften, Berlin, 
1954. 

[14] P. Bilingsley, Convergence of Probability Measures. John Wiley & Sons, New York, 
1968. 

[15] H. Fuks and J. Haroutunian, "Catalan numbers and power laws in cellular automaton 
rule 14," Journal of cellular automata 4 (2009) 99-110. 

[16] H. Fuks and A. Skelton, "Orbits of Bernoulli measure in asynchronous cellular 
automata," Dis. Math. Theor. Comp. Science AP (2011) 95-112. 

[17] H. Fuks and A. Skelton, "Response curves for cellular automata in one and two 

dimensions - an example of rigorous calculations," International Journal of Natural 
Computing Research 1 (2010) 85-99. arXiv: 1108. 1987i 

[18] H. Fuks, "Probabilistic initial value problem for cellular automaton rule 172," DMTCS 
proc. AL (2010) 31-44. larXiv : 1007 . 10261 

[19] H. Fuks, "Exact results for deterministic cellular automata traffic models," Phys. Rev. 
EQQ (1999) 197-202, [arXiv:comp-gas/9902001 1 

[20] H. A. Gutowitz and J. D. Victor, "Local structure theory in more than one 
dimension," Complex Systems 1 (1987) 57-68. 



28 



